Gary
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision
Leng, Yongqi, Lei, Yikun, Liu, Xikai, Zhong, Meizhi, Xiong, Bojian, Zhang, Yurong, Gao, Yan, Wu, Yi, Hu, Yao, Xiong, Deyi
Agentic Retrieval-Augmented Generation (Agentic RAG) enhances the processing capability for complex tasks through dynamic retrieval and adaptive workflows. Recent advances (e.g., Search-R1) have shown that outcome-supervised reinforcement learning demonstrate strong performance. However, this approach still suffers from inefficient exploration, sparse reward signals, and ambiguous global reward feedback. To address these challenges, we propose DecEx-RAG, which models RAG as a Markov Decision Process (MDP) incorporating decision-making and execution, while introducing an efficient pruning strategy to optimize data expansion. Through comprehensive process-level policy optimization, DecEx-RAG significantly enhances the autonomous task decomposition, dynamic retrieval, and high-quality answer generation capabilities of large language models (LLMs). Experiments show that DecEx-RAG achieves an average absolute performance improvement of $6.2\%$ across six datasets, significantly outperforming existing baselines. Moreover, the pruning strategy improves data construction efficiency by nearly $6 \times$, providing an efficient solution for process-supervised RAG training. The code is available at https://github.com/sdsxdxl/DecEx-RAG.
- Europe > Austria > Vienna (0.14)
- North America > United States > Missouri > Jackson County > Kansas City (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (17 more...)
- Leisure & Entertainment > Sports (1.00)
- Media (0.68)
- Law Enforcement & Public Safety (0.68)
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
Wang, Xiyao, Yang, Zhengyuan, Feng, Chao, Liang, Yongyuan, Zhou, Yuhang, Liu, Xiaoyu, Zang, Ziyi, Li, Ming, Lin, Chung-Ching, Lin, Kevin, Li, Linjie, Huang, Furong, Wang, Lijuan
Reinforcement learning (RL) has shown great effectiveness for fine-tuning large language models (LLMs) using tasks that are challenging yet easily verifiable, such as math reasoning or code generation. However, extending this success to visual perception in vision-language models (VLMs) has been impeded by the scarcity of vision-centric tasks that are simultaneously challenging and unambiguously verifiable. To this end, we introduce ViCrit (Visual Caption Hallucination Critic), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions. Starting from a 200-word captions, we inject a single, subtle visual description error-altering a few words on objects, attributes, counts, or spatial relations-and task the model to pinpoint the corrupted span given the image and the modified caption. This formulation preserves the full perceptual difficulty while providing a binary, exact-match reward that is easy to compute and unambiguous. Models trained with the ViCrit Task exhibit substantial gains across a variety of VL benchmarks. Crucially, the improvements transfer beyond natural-image training data to abstract image reasoning and visual math, showing promises of learning to perceive rather than barely memorizing seen objects. To facilitate evaluation, we further introduce ViCrit-Bench, a category-balanced diagnostic benchmark that systematically probes perception errors across diverse image domains and error types. Together, our results demonstrate that fine-grained hallucination criticism is an effective and generalizable objective for enhancing visual perception in VLMs.
- North America > United States > Michigan (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > United States > Indiana > Lake County > Gary (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers
Raza, Mohammad, Milic-Frayling, Natasa
Robustness of reasoning remains a significant challenge for large language models, and addressing it is essential for the practical applicability of AI-driven reasoning systems. We introduce Semantic Self-Verification (SSV), a novel approach that addresses the key challenge in combining language models with the rigor of logical solvers: to accurately formulate the reasoning problem from natural language to the formal language of the solver. SSV uses a consistency-based approach to produce strong abstract formalizations of problems using concrete instantiations that are generated by the model and verified by the solver. In addition to significantly advancing the overall reasoning accuracy over the state-of-the-art, a key novelty that this approach presents is a feature of verification that has near-perfect precision over a significant coverage of cases, as we demonstrate on open reasoning benchmarks. We propose such *near-certain reasoning* as a new approach to reduce the need for manual verification in many cases, taking us closer to more dependable and autonomous AI reasoning systems.
- Asia > Indonesia > Bali (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Indiana > Lake County > Gary (0.04)
- (2 more...)
Think-then-Act: A Dual-Angle Evaluated Retrieval-Augmented Generation
Shen, Yige, Jiang, Hao, Qu, Hua, Zhao, Jihong
Despite their impressive capabilities, large language models (LLMs) often face challenges such as temporal misalignment and generating hallucinatory content. Enhancing LLMs with retrieval mechanisms to fetch relevant information from external sources offers a promising solution. Inspired by the proverb "Think twice before you act," we propose a dual-angle evaluated retrieval-augmented generation framework \textit{Think-then-Act}. Unlike previous approaches that indiscriminately rewrite queries or perform retrieval regardless of necessity, or generate temporary responses before deciding on additional retrieval, which increases model generation costs, our framework employs a two-phase process: (i) assessing the input query for clarity and completeness to determine if rewriting is necessary; and (ii) evaluating the model's capability to answer the query and deciding if additional retrieval is needed. Experimental results on five datasets show that the \textit{Think-then-Act} framework significantly improves performance. Our framework demonstrates notable improvements in accuracy and efficiency compared to existing baselines and performs well in both English and non-English contexts. Ablation studies validate the optimal model confidence threshold, highlighting the resource optimization benefits of our approach.
- Asia > China > Shaanxi Province > Xi'an (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (7 more...)
The Mail
Andrew Marantz's appraisal of two Silicon Valley camps that hold conflicting ideas about A.I.'s development--"doomers," who think it may spell disaster, and "effective accelerationists," who believe it will bring unprecedented abundance--offers a fascinating look at the factions that have dominated the recent discourse ("O.K., Doomer," March 18th). But readers should know that these two vocal cliques do not speak for the entire industry. Many in the A.I. and machine-learning worlds are working to advance technological progress safely, and do not suggest (or, for that matter, believe) that A.I. is going to lead society to either utopia or apocalypse. These people include A.I. ethicists, who seek to mitigate harm that A.I. has caused or is poised to inflict. Ethicists focus on concrete technical problems, such as trying to create metrics to better define and evaluate fairness in a broad range of machine-learning tasks.
- North America > United States > California (0.26)
- North America > United States > Texas > Travis County > Austin (0.06)
- North America > United States > New York > Kings County > New York City (0.06)
- (2 more...)
- Education (0.53)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.32)
- Law (0.32)
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Radhakrishnan, Ansh, Nguyen, Karina, Chen, Anna, Chen, Carol, Denison, Carson, Hernandez, Danny, Durmus, Esin, Hubinger, Evan, Kernion, Jackson, Lukošiūtė, Kamilė, Cheng, Newton, Joseph, Nicholas, Schiefer, Nicholas, Rausch, Oliver, McCandlish, Sam, Showk, Sheer El, Lanham, Tamera, Maxwell, Tim, Chandrasekaran, Venkatesa, Hatfield-Dodds, Zac, Kaplan, Jared, Brauner, Jan, Bowman, Samuel R., Perez, Ethan
As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach to help with this issue is to prompt LLMs to externalize their reasoning, e.g., by having them generate step-by-step reasoning as they answer a question (Chain-of-Thought; CoT). The reasoning may enable us to check the process that models use to perform tasks. However, this approach relies on the stated reasoning faithfully reflecting the model's actual reasoning, which is not always the case. To improve over the faithfulness of CoT reasoning, we have models generate reasoning by decomposing questions into subquestions. Decomposition-based methods achieve strong performance on question-answering tasks, sometimes approaching that of CoT while improving the faithfulness of the model's stated reasoning on several recently-proposed metrics. By forcing the model to answer simpler subquestions in separate contexts, we greatly increase the faithfulness of model-generated reasoning over CoT, while still achieving some of the performance gains of CoT. Our results show it is possible to improve the faithfulness of model-generated reasoning; continued improvements may lead to reasoning that enables us to verify the correctness and safety of LLM behavior.
- North America > United States > Indiana > Lake County > Gary (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > New York (0.04)
- (8 more...)
Deep Active Alignment of Knowledge Graph Entities and Schemata
Huang, Jiacheng, Sun, Zequn, Chen, Qijin, Xu, Xiaozhou, Ren, Weijun, Hu, Wei
Knowledge graphs (KGs) store rich facts about the real world. In this paper, we study KG alignment, which aims to find alignment between not only entities but also relations and classes in different KGs. Alignment at the entity level can cross-fertilize alignment at the schema level. We propose a new KG alignment approach, called DAAKG, based on deep learning and active learning. With deep learning, it learns the embeddings of entities, relations and classes, and jointly aligns them in a semi-supervised manner. With active learning, it estimates how likely an entity, relation or class pair can be inferred, and selects the best batch for human labeling. We design two approximation algorithms for efficient solution to batch selection. Our experiments on benchmark datasets show the superior accuracy and generalization of DAAKG and validate the effectiveness of all its modules.
- North America > United States > Indiana > Lake County > Gary (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- (32 more...)
Opinion
For a few years, I've been trying to write a story about a cat. A.I. will not be able to write this, partly because the story is still inside my imagination and on a few rough pages that were originally drafted in Boston, on sheets of notebook paper, as I sat in my daughter's apartment on a hot summer day. If I have it published (who knows, it's a strange story), perhaps some machine will suck it into a system, break down my style, my usage, the themes I like to touch upon -- loss and despair, love and hope -- wide-ranging themes that, like all themes, arrive out of my own unique human concerns and have fueled me through six story collections. But for now, this story I haven't yet finished is inside my imagination, safe and sound, and no machine can make it or conjure it because no machine has been in my head as I wandered the streets of South Chicago, or stared at Lake Michigan from Promontory Point on the particular day I was there in June, or stopped in the parking lot of a supermarket called Treasure Island to examine a pile of snow, left over from a long winter, honeycombed and covered with dirt and grime, which is the image that closes the rough draft of my story; no machine stood with me in front of the Obama house, on the corner of 1118 Hyde Park Boulevard, and watched a Secret Service agent as he approached, another image that sparked the plot of my story, and certainly no machine was with me watching a cat named Baudelaire, my daughter's cat, as he played on a particular Chicago afternoon, in a particular moment years ago, clutching a piece of string -- yet another image that spoke to me through the retrospect of memory. No machine -- and I use that phrase because A.I. is a machine, and no matter how complicated, or even organic, its still-binary, open-and-shut gates may be -- looked through my eyes as I took the train to my hometown in Michigan, gazing out over the old steel mills of Gary, Ind., making note of images with intent, storing and twisting them in relation to the pain I felt that moment, riding back to my hometown in Michigan, to my father's interment ceremony, an experience that reminded me that I, too, will die someday, and the art I create will be all I leave behind.
- North America > United States > Michigan (0.71)
- North America > United States > Illinois > Cook County > Chicago (0.49)
- North America > United States > Indiana > Lake County > Gary (0.26)
Follow the Timeline! Generating Abstractive and Extractive Timeline Summary in Chronological Order
Chen, Xiuying, Li, Mingzhe, Gao, Shen, Chan, Zhangming, Zhao, Dongyan, Gao, Xin, Zhang, Xiangliang, Yan, Rui
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
- Asia > China (0.04)
- North America > United States > Indiana > Lake County > Gary (0.04)
- North America > United States > California (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Media > News (0.66)
Meaning without reference in large language models
Piantadosi, Steven T., Hill, Felix
The widespread success of large language models (LLMs) has been met with skepticism that they possess anything like human concepts or meanings. Contrary to claims that LLMs possess no meaning whatsoever, we argue that they likely capture important aspects of meaning, and moreover work in a way that approximates a compelling account of human cognition in which meaning arises from conceptual role. Because conceptual role is defined by the relationships between internal representational states, meaning cannot be determined from a model's architecture, training data, or objective function, but only by examination of how its internal states relate to each other. This approach may clarify why and how LLMs are so successful and suggest how they can be made more human-like.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Indiana > Lake County > Gary (0.04)
- (2 more...)